Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate ndarray-parallel and make rayon an optional feature #563

Merged
merged 24 commits into from Dec 3, 2018

Conversation

bluss
Copy link
Member

@bluss bluss commented Nov 24, 2018

This moves all the ndarray-parallel functionality directly into ndarray itself.

This means the parallelization features are more visible (in docs by default) and more accessible. The user still has to opt-in to use it, and use the rayon traits.

We also get rid of extra shims in the integration and use the rayon IntoParallel* traits directly, and can use inherent methods for par_apply and so on.

Fixes #551

@bluss
Copy link
Member Author

bluss commented Nov 25, 2018

Here I'm wondering, should we keep the parallel prelude separate? Or pack into the regular ndarray prelude?

It only contains this:

pub mod prelude {
    pub use rayon::prelude::{ParallelIterator, IndexedParallelIterator,
    IntoParallelIterator, IntoParallelRefIterator, IntoParallelRefMutIterator};
}
  1. I think it's a good idea to have these traits in a prelude somewhere, we don't want the user to have to extern crate rayon or add rayon as a direct dep if they are purely working in ndarray for a moment.
  2. I think we can move these into ndarray::prelude? Conditional on the rayon feature. These will show up/not show up in the prelude depending on the feature flag, but given Rust's rules for glob imports, they only collide if the items are used, it shouldn't actually be an incompatibility. Thoughts?

@bluss
Copy link
Member Author

bluss commented Nov 25, 2018

cc @cuviper Any thoughts about our integration with rayon and our parallel iterators? We're just moving code here and pulling in the rayon 1.0 dep in the main crate, how it's all supposed to be.

@jturner314
Copy link
Member

jturner314 commented Nov 27, 2018

I think we can move these into ndarray::prelude? Conditional on the rayon feature.

I lean towards keeping them separate, but putting them in ndarray::prelude would be fine too.

Edit: Earlier, I suggested adding pub use rayon::prelude as parallel to ndarray::prelude, but I no longer think that's a good idea. IMO, if there's a ndarray::parallel module, it would be better to put the parallel prelude at ndarray::parallel::prelude.

Another interesting idea is to reexport all of rayon at the top level of ndarray:

// ndarray/src/lib.rs
pub use rayon;

although users could just depend directly on rayon instead, so I'm not sure if it's worth doing this.

@bluss
Copy link
Member Author

bluss commented Nov 27, 2018

reexporting rayon seems worthwhile.

@jturner314
Copy link
Member

One other idea for the prelude is to reexport the traits in ndarray::prelude so that their methods can be used easily but hide them from the docs and mangle the trait names so they need to be explicitly imported to use directly:

// ndarray/src/prelude.rs
#[doc(hidden)]
pub use rayon::prelude::{
    ParallelIterator as __rayon_ParallelIterator,
    IndexedParallelIterator as __rayon_IndexedParallelIterator,
    IntoParallelIterator as __rayon_IntoParallelIterator,
    IntoParallelRefIterator as __rayon_IntoParallelRefIterator,
    IntoParallelRefMutIterator as __rayon_IntoParallelRefMutIterator,
};

I think std does this for some traits.

@bluss
Copy link
Member Author

bluss commented Nov 27, 2018

We're merging breaking changes to prepare next major release right?

I think taken together with being an opt-in feature it seems the rayon traits in the regular prelude is a breaking change / method name clash hazard.

I think it's pretty obvious we should have rayon as optional, but maybe I'm wrong?

@jturner314
Copy link
Member

We're merging breaking changes to prepare next major release right?

Yes, that makes sense to me.

I think it's pretty obvious we should have rayon as optional, but maybe I'm wrong?

I think it's a good idea to make rayon optional for the time being, given the difference in compile times; we can always make it required later.

@rcarson3
Copy link
Contributor

I think it's good to keep rayon optional even in the future. In my own opinion, it's best not to force certain parallel dependencies on people. Groups can have some very opinionated ideas for how best to parallelize their code. I've especially seen this in the C++ and C realm when you're moving towards the Exascale type of things. You can see very domain specific solutions that aren't obvious that they lead to a noticeable performance difference.

@bluss
Copy link
Member Author

bluss commented Nov 28, 2018

@rcarson3 fair enough, we don't have a plan for transparent/automatic parallelization anywhere though, it's all just separate methods and parallel iterators one can use so far.

@rcarson3
Copy link
Contributor

rcarson3 commented Nov 28, 2018

we don't have a plan for transparent/automatic parallelization anywhere though, it's all just separate methods and parallel iterators one can use so far.

@bluss I'm aware of this I've had the pleasure of using a few of the parallel iterators to get some easy parallelization out of my code :)

I guess I've become a bit more cautious(?) towards requiring one parallelization approach over another for even the parallelization of iterators/compute kernels as I've become more and more exposed to the different ways people have found to tackle these type of problems for HPC applications.

I'll probably still continue to use this crate no matter what decisions y'all end up going way down the road for the parallel stuff, since I enjoy the ease this crate has made writing numerical code in Rust :)

@jturner314
Copy link
Member

@rcarson3 Out of curiosity, have you done any HPC-type stuff in Rust? If so, what crates were useful? I have a small SLURM cluster but haven't yet tried running any Rust code across multiple machines.

@rcarson3
Copy link
Contributor

@jturner314 I haven't gotten the chance to mess around with any HPC stuff in Rust yet. Hopefully, I'll get the chance to experiment with some of this stuff in the next month or so.

Also, I've been eyeballing rsmpi as a traditional method to accomplish HPC type stuff in Rust. It looks like some people from LANL are involved which means we might start seeing Rust used at the DOE national labs which is exciting. I've also been looking at timely-dataflow and differential-dataflow to try and figure out if they could be used in a similar manner to what you might use MPI for.

@jturner314
Copy link
Member

@rcarson3 Thanks for the links. The dataflow projects look particularly exciting, and relatively easy-to-use.

@bluss bluss changed the title WIP: Integrate ndarray-parallel and make rayon an optional feature Integrate ndarray-parallel and make rayon an optional feature Dec 1, 2018
@bluss
Copy link
Member Author

bluss commented Dec 1, 2018

This should be ready. Using separate parallel prelude, and it contains the rayon traits and the par_azip macro.

@bluss
Copy link
Member Author

bluss commented Dec 1, 2018

I think the bounds on the &mut T producing parallel iterators are too strict. They currently use T: Send + Sync but for example a corresponding IntoParallelIterator on &mut [T] in rayon only requires T: Send.

@cuviper
Copy link
Contributor

cuviper commented Dec 2, 2018

@bluss

Any thoughts about our integration with rayon and our parallel iterators?

Looks good to me!

I noticed that views and zips are unindexed -- AFAIK their length is known, so is this just because they're hard to split at arbitrary indexes? (I have the same limitation in rayon-hash.)

I think the bounds on the &mut T producing parallel iterators are too strict. They currently use T: Send + Sync but for example a corresponding IntoParallelIterator on &mut [T] in rayon only requires T: Send.

Is there a technical reason why you're requiring Sync? Or just an oversight?

@bluss
Copy link
Member Author

bluss commented Dec 2, 2018

@cuviper They are impossible to split at arbitrary linear indexes, if they have more than one dimension. i.e a 3 × 3 × 3 array view should be split along a good combination of axis and index.

For example along axis 0, index 1 (That split produces the following pieces: view of 1 × 3 × 3 and 2 × 3 × 3 see also illustrations).

And we want to split them in a way that preserves element locality. This is a dynamic property. Maybe the elements are closest to each other along axis 1 and furthest from each other along axis 0, then we split along axis 0. We want to increase our chances to end up with a contiguous chunk to work on for the final single job.

We can split successively on different axes, for example when the first axis has been split down to length 1, we go on to the current axis of furthest separation with len > 1 and split along that.

What can be added is to make more producers indexed, in their 1D cases only, but I'm unsure if that's possible. Say we have an Parallel<ArrayView<f64, D>> and when D = Ix1 we will additionally implement IndexedParallelIterator. Since the parallel traits cooperate a bit, I'm not sure if that works out.

I think the Sync thing is just an oversight / “old quick solution to be safe”, I don't remember any reason we require that.

@cuviper
Copy link
Contributor

cuviper commented Dec 2, 2018

we want to split them in a way that preserves element locality.

I see -- and I suppose direct Array access can just split the backing storage, regardless of the formal shape? If that's the case, then it seems like you could do the same for iterating &Array and &mut Array, rather than going through views where that contiguity is no longer guaranteed.

edit: I think I mistook the macros for the indexed AxisIter part as a direct Array. Do you not have IntoParallelIterator for Array at all?

What can be added is to make more producers indexed, in their 1D cases only, but I'm unsure if that's possible. Say we have an Parallel<ArrayView<f64, D>> and when D = Ix1 we will additionally implement IndexedParallelIterator. Since the parallel traits cooperate a bit, I'm not sure if that works out.

The opt_len hack might be hard without specialization, unless you separately implemented each possible D for ParallelIterator, but that's just an optimization for collect to act like collect_vec.

I think it would work fine to just implement IndexedParallelIterator for the 1D case, and leave ParallelIterator as a more generic unindexed implementation. If that doesn't work for some reason, I'm willing to consider that as a rayon bug (whether or not it's possible to fix).

Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this stuff into ndarray makes things a lot nicer. Removing the Ndarray* traits is especially satisfying. :)

I've added some comments, primarily about docs. Some additional docs changes:

  • Remove See also the [`ndarray-parallel`] crate for integration with rayon. from src/lib.rs.
  • Add the rayon feature flag to the list in src/lib.rs.

I'd also remove the parallel directory (or maybe delete everything other than parallel/README.md and update it to say that the ndarray-parallel crate is deprecated). This makes the Git diff a lot easier to read (since it causes Git to recognize that most files were renamed with only small changes).

src/parallel/mod.rs Outdated Show resolved Hide resolved
src/parallel/mod.rs Outdated Show resolved Hide resolved
src/parallel/zipmacro.rs Show resolved Hide resolved
src/parallel/ext_traits.rs Outdated Show resolved Hide resolved
src/parallel/ext_traits.rs Outdated Show resolved Hide resolved
src/parallel/ext_traits.rs Outdated Show resolved Hide resolved
/// `Zip`.
///
/// Requires crate feature `rayon`.
pub fn par_apply<F>(self, function: F)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving par_apply out of this module and into src/zip/mod.rs immediately after apply. (This would make the docs somewhat clearer.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is left for later

@bluss
Copy link
Member Author

bluss commented Dec 3, 2018

@cuviper We have parallel iterators for array views, for &Array and &mut Array, but not Array itself. Just like we don't have a regular by value iterator.

Implementing Indexed parallel iterators is left for later. This PR is mostly moving the code from a separate crate into the main crate.

@bluss bluss merged commit ad714f8 into master Dec 3, 2018
@bluss bluss deleted the integrate-rayon branch December 3, 2018 20:56
@cuviper
Copy link
Contributor

cuviper commented Dec 3, 2018

We have parallel iterators for array views, for &Array and &mut Array, but not Array itself. Just like we don't have a regular by value iterator.

Implementing Indexed parallel iterators is left for later.

OK, but note that having <&Array as IntoParallelIterator>::Iter = Parallel<ArrayView> will restrict what you can do at the type level. Using something like Iter = Parallel<&Array> could still be implemented in terms of the view-style Producer from drive_unindexed, but would leave you the flexibility to switch to an indexed implementation later (without a breaking change).

@bluss
Copy link
Member Author

bluss commented Dec 3, 2018

An ArrayView is to an Array like a slice to a Vec, so I think it makes sense to implement the indexed parallel iterator on the arrayview, we don't have anything separate from array views that handles an &Array.

@cuviper
Copy link
Contributor

cuviper commented Dec 3, 2018

Right, but if I understand your earlier comment about preserving locality, a general ArrayView is not necessarily contiguous. Whereas a full &Array would be contiguous, and could more easily support arbitrary splits, no?

All I'm suggesting is:

// (with lots of omissions...)
impl IntoParallelIterator for &Array {
    type Iter = Parallel<&Array>;
}
impl ParallelIterator for Parallel<&Array> {
    fn drive_unindexed<C>(self, consumer: C) -> C::Result {
        // A delayed view conversion lets you do something else later
        self.0.view().into_par_iter().drive_unindexed(consumer)
    }
}

@bluss
Copy link
Member Author

bluss commented Dec 3, 2018

Aha that makes sense. We'll have to look if we can use that for extra features. In general, arrays and views have about the same properties in everything, including supporting custom strided axes, so I'm not sure if I'm right about not grasping the concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants